Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells652
Missing cells (%)9.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Warnings

pregnancies is highly correlated with ageHigh correlation
glucose is highly correlated with insulinHigh correlation
skin_thickness is highly correlated with bmiHigh correlation
insulin is highly correlated with glucoseHigh correlation
bmi is highly correlated with skin_thicknessHigh correlation
age is highly correlated with pregnanciesHigh correlation
pregnancies is highly correlated with ageHigh correlation
glucose is highly correlated with insulinHigh correlation
skin_thickness is highly correlated with bmiHigh correlation
insulin is highly correlated with glucoseHigh correlation
bmi is highly correlated with skin_thicknessHigh correlation
age is highly correlated with pregnanciesHigh correlation
age is highly correlated with pregnanciesHigh correlation
target is highly correlated with glucoseHigh correlation
blood_pressure is highly correlated with bmiHigh correlation
glucose is highly correlated with target and 1 other fieldsHigh correlation
diabetes_pedigree_func is highly correlated with bmiHigh correlation
insulin is highly correlated with glucoseHigh correlation
pregnancies is highly correlated with ageHigh correlation
bmi is highly correlated with blood_pressure and 2 other fieldsHigh correlation
skin_thickness is highly correlated with bmiHigh correlation
blood_pressure has 35 (4.6%) missing values Missing
skin_thickness has 227 (29.6%) missing values Missing
insulin has 374 (48.7%) missing values Missing
bmi has 11 (1.4%) missing values Missing
pregnancies has 111 (14.5%) zeros Zeros

Reproduction

Analysis started2021-11-12 17:27:24.519176
Analysis finished2021-11-12 17:27:34.481286
Duration9.96 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.845052083
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:34.523175image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
MonotonicityNot monotonic
2021-11-12T12:27:34.607947image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1135
17.6%
0111
14.5%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
Other values (7)58
7.6%
ValueCountFrequency (%)
0111
14.5%
1135
17.6%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
ValueCountFrequency (%)
171
 
0.1%
151
 
0.1%
142
 
0.3%
1310
 
1.3%
129
 
1.2%
1111
 
1.4%
1024
3.1%
928
3.6%
838
4.9%
745
5.9%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct135
Distinct (%)17.7%
Missing5
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean121.6867628
Minimum44
Maximum199
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:34.712230image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile80
Q199
median117
Q3141
95-th percentile181
Maximum199
Range155
Interquartile range (IQR)42

Descriptive statistics

Standard deviation30.53564107
Coefficient of variation (CV)0.2509364238
Kurtosis-0.2770397069
Mean121.6867628
Median Absolute Deviation (MAD)20
Skewness0.5309885349
Sum92847
Variance932.4253757
MonotonicityNot monotonic
2021-11-12T12:27:34.826463image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9917
 
2.2%
10017
 
2.2%
11114
 
1.8%
12914
 
1.8%
12514
 
1.8%
10614
 
1.8%
11213
 
1.7%
10813
 
1.7%
9513
 
1.7%
10513
 
1.7%
Other values (125)621
80.9%
ValueCountFrequency (%)
441
 
0.1%
561
 
0.1%
572
0.3%
611
 
0.1%
621
 
0.1%
651
 
0.1%
671
 
0.1%
683
0.4%
714
0.5%
721
 
0.1%
ValueCountFrequency (%)
1991
 
0.1%
1981
 
0.1%
1974
0.5%
1963
0.4%
1952
0.3%
1943
0.4%
1932
0.3%
1911
 
0.1%
1901
 
0.1%
1894
0.5%

blood_pressure
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct46
Distinct (%)6.3%
Missing35
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean72.40518417
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:34.944654image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile92
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.38215821
Coefficient of variation (CV)0.1710120394
Kurtosis0.9111578979
Mean72.40518417
Median Absolute Deviation (MAD)8
Skewness0.1341527317
Sum53073
Variance153.3178419
MonotonicityNot monotonic
2021-11-12T12:27:35.055359image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
7057
 
7.4%
7452
 
6.8%
7845
 
5.9%
6845
 
5.9%
7244
 
5.7%
6443
 
5.6%
8040
 
5.2%
7639
 
5.1%
6037
 
4.8%
6234
 
4.4%
Other values (36)297
38.7%
(Missing)35
 
4.6%
ValueCountFrequency (%)
241
 
0.1%
302
 
0.3%
381
 
0.1%
401
 
0.1%
444
 
0.5%
462
 
0.3%
485
 
0.7%
5013
1.7%
5211
1.4%
5411
1.4%
ValueCountFrequency (%)
1221
 
0.1%
1141
 
0.1%
1103
0.4%
1082
0.3%
1063
0.4%
1042
0.3%
1021
 
0.1%
1003
0.4%
983
0.4%
964
0.5%

skin_thickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct50
Distinct (%)9.2%
Missing227
Missing (%)29.6%
Infinite0
Infinite (%)0.0%
Mean29.15341959
Minimum7
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:35.164037image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile13
Q122
median29
Q336
95-th percentile46
Maximum99
Range92
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.47698237
Coefficient of variation (CV)0.3593740465
Kurtosis2.935491262
Mean29.15341959
Median Absolute Deviation (MAD)7
Skewness0.690619014
Sum15772
Variance109.7671596
MonotonicityNot monotonic
2021-11-12T12:27:35.272745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3231
 
4.0%
3027
 
3.5%
2723
 
3.0%
2322
 
2.9%
3320
 
2.6%
2820
 
2.6%
1820
 
2.6%
3119
 
2.5%
1918
 
2.3%
3918
 
2.3%
Other values (40)323
42.1%
(Missing)227
29.6%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
 
0.7%
116
0.8%
127
0.9%
1311
1.4%
146
0.8%
1514
1.8%
166
0.8%
1714
1.8%
ValueCountFrequency (%)
991
 
0.1%
631
 
0.1%
601
 
0.1%
561
 
0.1%
542
0.3%
522
0.3%
511
 
0.1%
503
0.4%
493
0.4%
484
0.5%

insulin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct185
Distinct (%)47.0%
Missing374
Missing (%)48.7%
Infinite0
Infinite (%)0.0%
Mean155.5482234
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:35.387470image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile41.65
Q176.25
median125
Q3190
95-th percentile395.5
Maximum846
Range832
Interquartile range (IQR)113.75

Descriptive statistics

Standard deviation118.7758552
Coefficient of variation (CV)0.7635950616
Kurtosis6.370521815
Mean155.5482234
Median Absolute Deviation (MAD)55
Skewness2.166463844
Sum61286
Variance14107.70378
MonotonicityNot monotonic
2021-11-12T12:27:35.500187image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10511
 
1.4%
1309
 
1.2%
1409
 
1.2%
1208
 
1.0%
947
 
0.9%
1807
 
0.9%
1007
 
0.9%
1356
 
0.8%
1156
 
0.8%
1106
 
0.8%
Other values (175)318
41.4%
(Missing)374
48.7%
ValueCountFrequency (%)
141
 
0.1%
151
 
0.1%
161
 
0.1%
182
0.3%
221
 
0.1%
232
0.3%
251
 
0.1%
291
 
0.1%
321
 
0.1%
363
0.4%
ValueCountFrequency (%)
8461
0.1%
7441
0.1%
6801
0.1%
6001
0.1%
5791
0.1%
5451
0.1%
5431
0.1%
5401
0.1%
5101
0.1%
4952
0.3%

bmi
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct247
Distinct (%)32.6%
Missing11
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean32.45746367
Minimum18.2
Maximum67.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:35.610880image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.2
Q127.5
median32.3
Q336.6
95-th percentile44.5
Maximum67.1
Range48.9
Interquartile range (IQR)9.1

Descriptive statistics

Standard deviation6.924988332
Coefficient of variation (CV)0.2133558063
Kurtosis0.8633790278
Mean32.45746367
Median Absolute Deviation (MAD)4.6
Skewness0.5939697506
Sum24570.3
Variance47.9554634
MonotonicityNot monotonic
2021-11-12T12:27:35.722543image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3213
 
1.7%
31.612
 
1.6%
31.212
 
1.6%
32.410
 
1.3%
33.310
 
1.3%
30.19
 
1.2%
32.89
 
1.2%
32.99
 
1.2%
30.89
 
1.2%
33.68
 
1.0%
Other values (237)656
85.4%
(Missing)11
 
1.4%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
19.52
0.3%
19.63
0.4%
19.91
 
0.1%
201
 
0.1%
20.11
 
0.1%
ValueCountFrequency (%)
67.11
0.1%
59.41
0.1%
57.31
0.1%
551
0.1%
53.21
0.1%
52.91
0.1%
52.32
0.3%
501
0.1%
49.71
0.1%
49.61
0.1%

diabetes_pedigree_func
Real number (ℝ≥0)

HIGH CORRELATION

Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763021
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:35.829826image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
MonotonicityNot monotonic
2021-11-12T12:27:35.947577image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2685
 
0.7%
0.2075
 
0.7%
0.2615
 
0.7%
0.2595
 
0.7%
0.2385
 
0.7%
0.194
 
0.5%
0.2634
 
0.5%
0.2994
 
0.5%
Other values (507)719
93.6%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0852
0.3%
0.0882
0.3%
0.0891
0.1%
0.0921
0.1%
0.0961
0.1%
0.11
0.1%
0.1011
0.1%
0.1021
0.1%
ValueCountFrequency (%)
2.421
0.1%
2.3291
0.1%
2.2881
0.1%
2.1371
0.1%
1.8931
0.1%
1.7811
0.1%
1.7311
0.1%
1.6991
0.1%
1.6981
0.1%
1.61
0.1%

age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24088542
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-11-12T12:27:36.060311image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
MonotonicityNot monotonic
2021-11-12T12:27:36.171981image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2272
 
9.4%
2163
 
8.2%
2548
 
6.2%
2446
 
6.0%
2338
 
4.9%
2835
 
4.6%
2633
 
4.3%
2732
 
4.2%
2929
 
3.8%
3124
 
3.1%
Other values (42)348
45.3%
ValueCountFrequency (%)
2163
8.2%
2272
9.4%
2338
4.9%
2446
6.0%
2548
6.2%
2633
4.3%
2732
4.2%
2835
4.6%
2929
3.8%
3021
 
2.7%
ValueCountFrequency (%)
811
 
0.1%
721
 
0.1%
701
 
0.1%
692
0.3%
681
 
0.1%
673
0.4%
664
0.5%
653
0.4%
641
 
0.1%
634
0.5%

target
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500 
1
268 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters768
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Length

2021-11-12T12:27:36.360542image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-12T12:27:36.414395image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring characters

ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number768
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring scripts

ValueCountFrequency (%)
Common768
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Interactions

2021-11-12T12:27:27.582957image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:27.691326image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:27.800217image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:27.901948image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:27.998197image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.085561image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.183877image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.289087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.391845image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.500555image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.612222image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.716983image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.815680image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:28.907434image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.011171image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.116906image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.222591image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.323225image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.424922image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.519745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.604092image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:29.681882image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.012572image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.112303image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.208049image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.298806image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.388566image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.471348image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.556119image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.637900image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.721676image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.809914image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.897716image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:30.981460image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.065272image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.151011image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.238266image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.313608image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.390010image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.472791image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.553577image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.649320image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.748020image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.841769image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:31.925579image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.003371image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.093097image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.186879image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.278601image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.382833image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.486593image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.582336image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.672096image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.754900image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.848231image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:32.948435image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.047788image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.150515image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.254205image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.462649image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.555400image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.645159image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.745920image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-12T12:27:33.848615image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-11-12T12:27:36.472314image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-12T12:27:36.606071image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-12T12:27:36.897716image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-12T12:27:37.032324image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-12T12:27:34.029639image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-12T12:27:34.191714image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-12T12:27:34.325229image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-11-12T12:27:34.418978image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

pregnanciesglucoseblood_pressureskin_thicknessinsulinbmidiabetes_pedigree_funcagetarget
06148.072.035.0NaN33.60.627501
1185.066.029.0NaN26.60.351310
28183.064.0NaNNaN23.30.672321
3189.066.023.094.028.10.167210
40137.040.035.0168.043.12.288331
55116.074.0NaNNaN25.60.201300
6378.050.032.088.031.00.248261
710115.0NaNNaNNaN35.30.134290
82197.070.045.0543.030.50.158531
98125.096.0NaNNaNNaN0.232541

Last rows

pregnanciesglucoseblood_pressureskin_thicknessinsulinbmidiabetes_pedigree_funcagetarget
7581106.076.0NaNNaN37.50.197260
7596190.092.0NaNNaN35.50.278661
760288.058.026.016.028.40.766220
7619170.074.031.0NaN44.00.403431
762989.062.0NaNNaN22.50.142330
76310101.076.048.0180.032.90.171630
7642122.070.027.0NaN36.80.340270
7655121.072.023.0112.026.20.245300
7661126.060.0NaNNaN30.10.349471
767193.070.031.0NaN30.40.315230